NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Mozart: Taming Taxes and Composing Accelerators with Shared-Memory

https://doi.org/10.1145/3656019.3676896

Suresh, Vignesh; Mishra, Bakshree; Jing, Ying; Zhu, Zeran; Jin, Naiyin; Block, Charles; Mantovani, Paolo; Giri, Davide; Zuckerman, Joseph; Carloni, Luca P; et al (October 2024, ACM)

Full Text Available
SpikeHard: Efficiency-Driven Neuromorphic Hardware for Heterogeneous Systems-on-Chip

https://doi.org/10.1145/3609101

Clair, Judicael; Eichler, Guy; Carloni, Luca P (October 2023, ACM Transactions on Embedded Computing Systems)

Neuromorphic computing is an emerging field with the potential to offer performance and energy-efficiency gains over traditional machine learning approaches. Most neuromorphic hardware, however, has been designed with limited concerns to the problem of integrating it with other components in a heterogeneous System-on-Chip (SoC). Building on a state-of-the-art reconfigurable neuromorphic architecture, we present the design of a neuromorphic hardware accelerator equipped with a programmable interface that simplifies both the integration into an SoC and communication with the processor present on the SoC. To optimize the allocation of on-chip resources, we develop an optimizer to restructure existing neuromorphic models for a given hardware architecture, and perform design-space exploration to find highly efficient implementations. We conduct experiments with various FPGA-based prototypes of many-accelerator SoCs, where Linux-based applications running on a RISC-V processor invoke Pareto-optimal implementations of our accelerator alongside third-party accelerators. These experiments demonstrate that our neuromorphic hardware, which is up to 89× faster and 170× more energy efficient after applying our optimizer, can be used in synergy with other accelerators for different application purposes.
more » « less
Full Text Available
MindCrypt: The Brain as a Random Number Generator for SoC-Based Brain-Computer Interfaces

https://doi.org/10.1109/ICCD58817.2023.00021

Eichler, Guy; Seyoum, Biruk; Chiu, Kuan-Lin; Carloni, Luca P (November 2023, IEEE)

Full Text Available
EigenEdge: Real-Time Software Execution at the Edge with RISC-V and Hardware Accelerators

https://doi.org/10.1145/3576914.3587510

Chiu, Kuan-Lin; Eichler, Guy; Seyoum, Biruk; Carloni, Luca (May 2023, Cyber-Physical Systems and Internet of Things Week)

Full Text Available
PR-ESP: An Open-Source Platform for Design and Programming of Partially Reconfigurable SoCs

https://doi.org/10.23919/DATE56975.2023.10137141

Seyoum, Biruk; Giri, Davide; Chiu, Kuan-Lin; Natter, Bryce; Carloni, Luca (April 2023, Design, Automation & Test in Europe Conference & Exhibition (DATE))

Full Text Available
Accelerators & Security: The Socket Approach

https://doi.org/10.1109/LCA.2022.3179947

Piccolboni, Luca; Giri, Davide; Carloni, Luca P. (July 2022, IEEE Computer Architecture Letters)

Full Text Available
22.9 A 12nm 18.1TFLOPs/W Sparse Transformer Processor with Entropy-Based Early Exit, Mixed-Precision Predication and Fine-Grained Power Management

https://doi.org/10.1109/ISSCC42615.2023.10067817

Tambe, Thierry; Zhang, Jeff; Hooper, Coleman; Jia, Tianyu; Whatmough, Paul N.; Zuckerman, Joseph; Santos, Maico Cassel; Loscalzo, Erik Jens; Giri, Davide; Shepard, Kenneth; et al (February 2023, 2023 IEEE International Solid- State Circuits Conference (ISSCC))

Large language models have substantially advanced nuance and context understanding in natural language processing (NLP), further fueling the growth of intelligent conversational interfaces and virtual assistants. However, their hefty computational and memory demands make them potentially expensive to deploy on cloudless edge platforms with strict latency and energy requirements. For example, an inference pass using the state-of-the-art BERT-base model must serially traverse through 12 computationally intensive transformer layers, each layer containing 12 parallel attention heads whose outputs concatenate to drive a large feed-forward network. To reduce computation latency, several algorithmic optimizations have been proposed, e.g., a recent algorithm dynamically matches linguistic complexity with model sizes via entropy-based early exit. Deploying such transformer models on edge platforms requires careful co-design and optimizations from algorithms to circuits, where energy consumption is a key design consideration.
more » « less
Full Text Available
Work-in-Progress: An Open-Source Platform for Design and Programming of Partially Reconfigurable Heterogeneous SoCs

https://doi.org/10.1109/CASES55004.2022.00019

Seyoum, Biruk B.; Giri, Davide; Chiu, Kuan-Lin; Carloni, Luca P. (January 2022, International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES))

Full Text Available
HARDROID: Transparent Integration of Crypto Accelerators in Android

https://doi.org/10.1109/HPEC49654.2021.9622875

Piccolboni, Luca; Di Guglielmo, Giuseppe; Sethumadhavan, Simha; Carloni, Luca P. (September 2021, IEEE High Performance Extreme Computing Conference (HPEC))

Full Text Available
DB4HLS: A Database of High-Level Synthesis Design Space Explorations

https://doi.org/10.1109/LES.2021.3066882

Ferretti, Lorenzo; Kwon, Jihye; Ansaloni, Giovanni; Di Guglielmo, Giuseppe; Carloni, Luca; Pozzi, Laura (December 2021, IEEE Embedded Systems Letters)

Full Text Available

« Prev Next »

Search for: All records